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Abstract - In this paper, computational aspects of the 
panel aggregation problem are addressed. Motivated pri- 
marily by applications of risk assessment, an algorithm 
is developed for fusing large corpora of internally inco- 
herent probability assessments. The algorithm is char- 
acterized by a provable performance guarantee, and is 
demonstrated to be orders of magnitude faster than exist- 
ing tools when tested on several real-world data-sets. In 
addition, unexpected connections between research in risk 
assessment and wireless sensor networks are exposed, as 
several key ideas are illustrated to be useful in both fields. 

Keywords: aggregation, forecasting, fusion, risk assessment, 
sensor networks 



1 Introduction 

1.1 Aggregating Human Expertise 

In this paper, we address the problem of aggregating hu- 
man expertise, motivated primarily by applications of risk 
assessment and analysis |T2]- In these settings, a dearth 
of hard data often limits one's ability to extrapolate the 
future from the past. As a result, panels of human ex- 
perts are frequently consulted to make forecasts about fu- 
ture events and to characterize the uncertainty therein. For 
example, stock market analysts are consulted to design risk- 
balanced investment portfolios, and geopolitical forecasters 
help construct robust policies and risk-based resource allo- 
cation schemes |2] ■ Typically, a multiplicity of experts 
are consulted in order to maximize the information available 
to the would-be decision-maker. However, a panel's gener- 
ally disparate opinion often needs to be fused to provide a 
single, coherent world view that is useful for decision-making 
and analysis. 

This panel aggregation problem represents a classic ex- 
ample of information fusion wherein experts' forecasts must 
be combined for use by a centralized decision-maker. Un- 
der various models for the information provided by the ex- 
perts, the aggregation problem has been usefully addressed 
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in fields including philosophy, law, statistics, risk analysis 
and computer science. Recently, Osherson and Vardi |16] 
considered the case where human judges provide forecasts 
of subjective probability for both logically simple and com- 
plex events. A coherent approximation principle (CAP) was 
proposed as a generalization of linear averaging (see, e.g., 
[7], [S]). As discussed below, CAP is practically motivated, 
accommodating both incoherent (e.g., human) judges and 
partially specified forecasts. However, as noted in |16|. im- 
plementing CAP is NP-Hard in the general case. Thus, for 
problems of interest, the CAP approach to fusion is com- 
putationally infeasible in theory and practice. 

Also in Osherson and Vardi propose a method for 
addressing CAP's computational challenge. Termed SAPA 
(Simulated Annealing over Probability Arrays), their algo- 
rithm applies to a very broad class of logically complex fore- 
casts. Though vastly better than off-the-shelf tools, SAPA 
nonetheless requires many hours to aggregate forecasts pro- 
vided by reasonably sized panels; CAP remains of limited 
use in practice. 

Nevertheless, in several experiments documented in Jig], 
it was noted that on real-world data sets, fusing expertise 
using CAP (via SAPA) improves the forecasting accuracy 
of panel members according to several naturally quantified 
measures for stochastic accuracy (we elaborate on this find- 
ing below). This empirical result invites us to develop com- 
putationally efficient tools for implementing (or approxi- 
mately implementing) CAP, so that these findings may be 
exploited in practice. 

Thus, the primary motivation for this paper is CAP's 
computational challenge. Here, we derive a scalable algo- 
rithm for fusing forecasts of probability according to CAP. 
By exploiting the logical simplicity of the events in ques- 
tion, a convenient application of alternating projection al- 
gorithms provides a fast tool for risk assessment with a 
provable performance guarantee and documented empirical 
success. 



1.2 Wireless Sensor Networks 

A recurrent theme in the study of wireless sensor net- 
works (WSNs) PP is the need to exploit node-level intel- 
ligence when designing communication-efficient systems for 
distributed inference. With sensors that communicate infer- 



ences (rather than raw data), future WSNs will trade com- 
putational power for energy and bandwidth. This vision is 
a driver behind the demand for collaborative signal pro- 
cessing and for fusion strategies for aggregating inferences 
made by smart sensors. As alluded to above, researchers in 
risk assessment have long been interested in extracting ro- 
bust and calibrated forecasts from human experts through 
collaboration and aggregation, and have developed a host 
of tools for doing so. Thus, a secondary motivation of this 
paper is to connect studies in risk assessment with research 
in sensor networks (and vice- versa), and to expose a set of 
fundamental tools that may be useful for both. 

1.3 Organization 

The remainder of this paper is organized as follows. In Sec- 
tion 2, we introduce notation and review alternating projec- 
tion algorithms, a tool that we exploit in deriving our scal- 
able aggregation algorithm. In Section 3, we formalize the 
panel aggregation problem as an instance of information fu- 
sion, review Osherson and Vardi's coherent approximation 
principle, and discuss its relation to other approaches to 
aggregation. In Section 4, we derive an iterative algorithm 
which approximately implements CAP and we discuss a the- 
orem which characterizes the algorithm's dynamics. In Sec- 
tion 5, we validate our approach with experiments on several 
real-world data sets. Finally, in Section 6, we discuss exten- 
sions of the current work and connections to collaborative 
signal processing in WSNs. 

2 Preliminaries 
2.1 Notation 

Let A = (X\, . . . ,X n ) be a vector of Boolean 1 variables. 
Each component of A models a basic event. For example, 
the event that "Google stock outperforms the NASDAQ in 
the third quarter" may be described by a Boolean variable 
X\ whose value is 1 if the event is true and otherwise. 
A therefore models a set of n basic events, which could de- 
scribe the performance of a set of stocks, the status of vari- 
ous economic indicators, the outcome of geopolitical events, 
etc. 

Complex events are modeled by joining the components 
of A with logical connectives like {-i, A, V, . . .}. For exam- 
ple, the complex event that "Google stock outperforms the 
NASDAQ AND the U.S. GDP increases in the third quar- 
ter" may be modeled by the conjunction Ai A A2, with 
A2 appropriately chosen. In a slight abuse of notation, we 
henceforth refer to components of A and logical combina- 
tions thereof as basic events and complex events, respec- 
tively. 

A forecast (E, p) is an event E (basic or complex) paired 
with a real-number p 6 [0,1]. p is interpreted as an assess- 
ment of the probability that the event E is true. In the 

lr The assumption that the variables are Boolean is made merely to 
simplify exposition; all the subsequent discussion and results hold for 
more general multi- valued discrete variables. 



sequel, we deal with collections of forecasts {(Ei,Pi)}^L lt 
an important concept of which is probabilistic coherence. 

Definition 1 A set of forecasts {(Ei,Pi)}^l 1 is probabilis- 
tically coherent if and only if they are implied by a joint 
probability distribution over A. 

The following easy-to-prove lemma is important for the 
subsequent development. 

Lemma 1 Let C = C^}™ J C [0, l] m be the set such 
that {(Ei,pi)}™ =1 is probabilistically coherent if and only if 
P = (Pi)Y=i £ Then, for any set of events {-Ei}™ lr C is 
closed and convex. 

2.2 Alternating Projection Algorithms 

Let Ci, . . . , Ci be closed, convex subsets of R m , whose in- 
tersection C = <~)\ =1 Ci is non-empty. For any x £ R m , let 
-Pc(x) denote the least-squares projection of x onto C, i.e., 

P c (x) := argmin||x-x||l. 

Alternating projection algorithms |S] provide a way to com- 
pute -Pc(') given 

{-Pc, (OiLi- Depicted in Table 1, the von 
Neumann-Halperin algorithm is an example of one natural 
approach. 



Initialize: xo := x 




Iterate: x n+ i := Pc (n 


mod 0+1 ( Xn ) 



Table 1: The von Neumann-Halperin Algorithm 

In words, the algorithm successively and iteratively projects 
onto each of the subsets. In the case where Ci is a linear 
subspace for alH € {1, . . . , I}, this algorithm was first stud- 
ied by von Neumann and subsequently by Halperin. Much 
of the behavior of this algorithm can be understood through 
Theorem ^ the proof of which can be found in [^j. 

Theorem 1 Let {Ci}\ =1 be a collection of closed, convex 
subsets o/IR m whose intersection C — n' =1 Cj is nonempty. 
Let x n be defined as in the von Neumann-Halperin algo- 
rithm. Then, for every xgC and every n > 1, 

||Xn - x lb < ||x«-i - x|| 2 . 

Moreoever, lim n _>ooXn G n| =1 Cj. If Ci is affine for all 
i E {1, I}, then lim^oo ||x„ - Pc(x)||2 = 0. 

Often examined in the context of the convex feasibility 
problem, the von Neumann-Halperin algorithm has been 
generalized in various ways to address more general con- 
vex sets and non-orthogonal projections; accordingly, the 
algorithm often takes on other names (e.g., Bregman's al- 
gorithm, Dykstra's algorithm). 
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3 The Panel Aggregation Problem 



3.3 Related Work 



3.1 A Model 

Suppose that each of m judges assesses the likelihood of a set 
of events; let £, = {(^ij,J%)}j=i denote the set of forecasts 
provided by judge i. We assume that the events that make 
up Si are defined over the same X for all i = I, . . . ,m; 
however, we make no additional assumptions regarding the 
logical relationship between events in Ei and £j. In other 
words, we assume that panel members provide forecasts for 
the same "problem domain" , but may assess the likelihood 
for altogether different, though perhaps logically related, 
events. With this model, the panel aggregation problem 
can be stated as follows: 

Given the judges' forecasts y, derive a coher- 

ent set of forecasts that jointly reflects the panel's 
expertise. 

3.2 The Coherent Approximation Princi- 
ple 

Osherson and Vardi propose a coherent approximation 
principle (CAP) for addressing the panel aggregation prob- 
lem. In particular, they suggest aggregating the panel's 
expertise by solving the following optimization problem: 

min E™ i Y%Li \Pij ~ Pij 1 2 (!) 

s.t. Ujl 1 {(Ey-,py)}^f 1 is coherent. 

Here, the optimization variables are {p%j}', the events in 
{Eij} and the probability assessments {pij} are the pro- 
gram data. Consistent with the definition of the panel ag- 
gregation problem in Section 3.1, the output of CAP is a 
coherent set of forecasts for the events in {Eij}, and not 
(necessarily) a joint probability distribution over X. 

By solving (JTJ, one finds the coherent forecasts that 
are minimally different (with respect to squared-deviation) 
from those provided by the panel, intuitively preserving the 
"information" provided by the judges while gaining prob- 
abilistic coherence. From a statistical perspective, com- 
puting Q can be interpreted as finding the maximum- 
likelihood coherent forecasts {pij} given additive white 
noise corrupted observations {f>ij}- Finally, CAP offers a 
geometric interpretation: by Lemma 1, there exists a closed 
convex set C = C({Eij}) that defines the numbers which 
comprise coherent forecasts for the events in question; p, 
a vector concatenation of lies outside this set. CAP 

suggests fusing the panel's expertise by computing the or- 
thogonal projection of p onto C. Henceforth, the forecasts 
determined by solving (JJJ will be referred to as the CAP- 
Aggregate for the panel. 

As discussed in solving (and therefore, im- 

plementing CAP) is NP-Hard in the general case. In 
particular, note that checking whether a set of forecasts 
{(Ejj,Pij)} , £l 1 is probabilistically coherent can be reduced 
to solving Jj|; and checking for probabilistic coherence is 
strictly more general than checking whether the formulae 
that describe the events {Eij} are mutually satisfiable. 



The literature on the panel aggregation problem is expan- 
sive, as it has been touched upon in philosophy, law, statis- 
tics, risk analysis, and computer science; we refer the in- 
terested reader to the brief survey in [IJii for an entry 
point. Here, we discuss the literature immediately relevant 
to CAP, and augment the survey in jlfij with a discussion 
of related work in computer science. 

Linear averaging , [Hj is arguably the most popular ag- 
gregation principle, given its simplicity, various axiomatic 
justifications, and documented empirical success. To il- 
lustrate this natural approach, consider the panel exhib- 
ited in Table 2. Here, three judges provide forecasts for 
three events, a conjunction and its conjuncts. The "Aggre- 
gate" forecast is the simple un- weighted average of the three 
judges' forecasts. Though appealing, linear averaging is not 
without pitfalls, as can be illustrated with a few examples. 

For instance, an underlying assumption in linear averag- 
ing is that each judge is probabilistically coherent. Averag- 
ing is appropriate under this assumption since (by Lemma 
1) the linear averaged aggregate is probabilistically coherent 
whenever the individual judges are coherent. However, in 
applications of interest, the judges are humans, who are no- 
toriously incoherent. For example, the conjunction fallacy, 
a robust finding from psychology ^21; G0I> demonstrates 
that human judges (even experts!) often assign higher prob- 
ability to a conjunction that its conjuncts. Table 2 illus- 
trates such a case. In particular, note that "Chris" is inco- 
herent since the probability assigned to the event p A q is 
greater than the probability assigned to q, i.e., 0.6 > 0.0; 
the linear averaged aggregate is similarly incoherent. Thus, 
though linear averaging naturally addresses inter-judge dis- 
agreement, it will not in general provide a coherent aggre- 
gate when individual judges are themselves incoherent. 

A clever analyst may circumvent this problem by so- 
liciting forecasts for logically independent events. Such a 
strategy may work in isolated cases, but it is not a general 
solution and may ultimately require the analyst to ignore 
subtleties in the experts' forecasts. For example, a market 
analyst may complement a forecast concerning the NAS- 
DAQ by forecasting a correlation between the NASDAQ 
and currency exchanges; a geopolitical expert may assess 
the likelihood of a terror attack in a particular city and 
also by forecast the probability of an attack in any city. In 
short, there is information to be gleaned from forecasts for 
logically complex events: practical aggregation principles 
should recognize this fact while accommodating intra-judge 
incoherence. 





Alice 


Bob 


Chris 


Aggregate 


p 


0.75 


0.60 


0.95 


0.67 


q 


0.20 


0.10 


0.00 


0.10 


pAq 


0.10 


0.10 


0.60 


0.20 



Table 2: Linear Averaging: Incoherent Judges 

In practice, human judges may be unable or unwilling 
to offer forecasts for every event in question. Communica- 
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tion constraints may preclude judges from collaborating, or 
individual judges may find themselves unqualified to fore- 
cast the likelihood of particular events. To reuse an earlier 
example, an analyst may be unwilling to forecast events per- 
taining to the technology sector but may willing do discuss 
the correlation between the NASDAQ and the currency ex- 
changes. Such a case is illustrated in Table 3, where each 
judge provides an incomplete but coherent set of forecasts. 
The incoherence of the pairwise average aggregate demon- 
strates that linear averaging is also inappropriate in the case 
where judges provide only partial forecasts. 

Given the aforementioned limitations of linear averag- 
ing, a natural question arises: how should one aggregate 
(i.e., fuse) the opinion expressed by incoherent judges on 
overlapping but generally different sets of logically complex 
events? CAP addresses this question by generalizing lin- 
ear averaging. In particular, note that the CAP aggregate 
equals the un-weighted averaged aggregate whenever prob- 
abilistically coherent judges provide forecasts for the same 
set of events. 





Alice 


Bob 


Chris 


Aggregate 


p 


0.75 


0.60 


NA 


0.67 


Q 


0.20 


NA 


0.00 


0.10 


pAq 


NA 


0.40 


0.00 


0.20 



Table 3: Linear Averaging: Partial Forecasts 

Lindley et al. consider a Bayesian approach to rec- 
onciling probability forecasts, whereby "noisy" observations 
{pij} are assumed to arise from a coherent set {pij}- CAP 
can be viewed as a special-case of their model, since as dis- 
cussed above, the solution to Q admits a Bayesian interpre- 
tation as the maximum-likelihood coherent forecasts given 
additive white noise corrupted observations {pij}- However, 
note that |13| sought to eliminate incoherence from a single 
judge, whereas CAP was introduced to address the panel 
aggregation problem. Moreover, Osherson and Vardi were 
motivated by non-statistical interpretations of CAP and as 
here, addressed the computational issue of implementing 
CAP. 

A panel-aggregation problem is addressed in the "online" 
learning model, which is frequently studied in learning the- 
ory |Sj ^U]. In that setting, a panel of experts predicts 
the true outcome of a set of events. A central agent con- 
structs its own forecast by fusing the experts' predictions, 
and upon learning the truth, suffers a loss sometimes spec- 
ified by a quadratic penalty function. In repeated trials, 
the agent updates its fusion rule (e.g., the "weights" in a 
weighted average), taking into account the performance of 
each expert. Under minimal assumptions on the evolution 
of these trials, bounds are derived that compare the trial- 
averaged performance of the central agent with that of the 
best (weighted combination of) expert (s). In contrast to 
the current framework, the online model typically assumes 
that each expert provides a forecast for the same event or 
partition of events. Thus, fusion strategies such as weighted 
averaging are appropriate in the online model, for the same 
reasons discussed above. Also, observe that the present 



model concerns a single "trial" , not many. 

Finally, proponents of Dempster-Shafer theory JHj (and 
associated fusion rules) object to probability as an idiom 
for belief, in part because of its inability to distinguish un- 
certainty from ignorance. The merits of Dempster-Shafer 
aside, one could argue for abstention as an expression of 
ignorance. As the preceding examples illustrate, even ab- 
staining experts may disagree (i.e., experts' forecasts may 
be mutually incoherent), and therefore the panel aggrega- 
tion problem remains. Thus, CAP is a natural aggrega- 
tion principle in the setting where judges express uncer- 
tainty with probability and ignorance through abstention, 
and thereby extends the utility of probabilistic forecasts by 
affording experts more expressive beliefs with abstention. 

4 A Scalable Approach 

In principle, implementing CAP by solving can be ac- 
complished using quadratic programming. In the general 
case, this approach requires a representation of joint distri- 
butions on X, for which 0(2") free variables are necessary. 
For panels that assess relatively small numbers of events, 
the quadratic programming approach is nonetheless feasi- 
ble. In cases of interest, hundreds of judges forecast thou- 
sands of events, yet off-the-shelf tools for solving quadratic 
programs do not scale. 

Nevertheless, the logical complexity of the events as- 
sessed by human judges is usually bounded. For exam- 
ple, experts are often constrained to forecast events with 
no more than three literals (e.g., three-term conjunctions). 
The idea at the heart of our approach is to exploit such log- 
ical simplicity by decomposing (JTJ into a collection of small 
sub-problems, each of which can be solved quickly using 
off-the-shelf tools. 

We now present our main result, a general algorithm for 
aggregating large corpora of probability forecasts. To aid 
exposition, let us do away with the multi-judge distinction 
by assuming that there is a single body of forecasts £ = 
{(Ei,Pi)}'^L 1 . We do so without loss of generality, since we 
may construct £ by pooling all the judges' forecasts into a 
single set. Also, let us assume that every event in {E i } 7 ^L 1 
is unique. Below, we demonstrate how this assumption may 
be relaxed. 

4.1 A General Algorithm 

To state our general algorithm, it is helpful to introduce a 
notion of local coherence. Let {{Ei,pi)}™ =1 be a collection 
of forecasts and let a C {1, . . . , m}. The requirement that 
{(Ei,Pi)}tLi be probabilistically coherent can be relaxed 
by requiring only the subset {(£'i,Pi)}ie<T be coherent. For 
notational convenience, we henceforth say that {(Ei,pi)}^_ 1 
is locally coherent with respect to a whenever {(Ei,pi)}i etT 
is coherent. 

With this formalism, note that "global" coherence is re- 
covered by taking a = {1, . . . , m}. Moreover, note that any 
probabilistically coherent set {(Ei,pi)}™ =1 must be locally 
coherent with respect to a for all a C {1, ... , m}. 
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With that, let us relax by choosing a collection of sub- 
sets {<jj} l j =1 and defining the following optimization prob- 
lem. 



min T,iLi\Pi-p*\ 2 

s.t. {(Ei,pi)} i&tTj iB coherent 



(2) 



Vj = 1, 



To emphasize, © is a relaxation of , since in general local 
coherence does not imply global coherence. However, this 
relaxation permits a geometric interpretation, as a projec- 
tion onto the intersection of I convex sets. Thus, alternating 
projection algorithms are applicable to solving J5J. In par- 
ticular, an algorithm for solving J2J is detailed in Table 4; 
note that it is exactly the von Neumann-Halperin algorithm 
interpreted in the language of the panel aggregation prob- 
lem. 



Input: {{Ei,pi)}T=i 

Initalize: Auxiliary forecasts {(Ei, qi)}iLi, with qi := pi. 

Step 1: Design {<Tj}j =1 with aj C {1, . . . , m}. 

Step 2: fort=l,...,T 
for j = 1, 

p t j := argminX;™ i \pi - q*\ 2 

s.t. {{E z ,pi)} i€tT] is coherent. 
Update {{E l ,q i )} iea] <- {(-E^Ptj.i)}^^ 
Output: {(Ei, qi )}Zi- 



of the coherence constraints are represented. CAP, on the 
other hand, groups all the events into a single subset, requir- 
ing global coherence; this case is depicted in Figure 2. The 
CAP approach represents all the coherence constraints, but 
as discussed above, is computationally infeasible in practice. 

A cleverer design may select subsets according to the log- 
ical relationship between the events in question. In Figure 
3, for example, it is proposed to group basic events with 
their negations, and conjunction (disjunctions) with their 
corresponding conjuncts (disjuncts). By choosing all sub- 
sets of this form, we enforce a very strong set of local coher- 
ence constraints; crucially, however, each subset contains at 
most three events. Intuitively, solving (J2J) using these sub- 
sets will quickly approximate the CAP-aggregate given the 
balance we have struck between approximation and speed. 
This intuition is borne out in the experiments. 

Basic ^QaQaQaG)' ... (Q^; 



Complex Events 1 



Figure 1: Linear Averaging 



Table 4: A Scalable Approach to Aggregation 



In this algorithm, computation occurs in the inner loop, 
when projecting q onto a set of local coherence constraints. 
This computation requires only \<jj\ forecasts, since ptj,i — 
qi for all i £ <jj, and can be achieved using off-the-self tools 
for quadratic programming or more specialized tools like 
SAPA. 

The crucial step in this algorithm is Step 1, designing 
{crj} l j =1 . Intuitively, the fewer events that each subset con- 
tains, the faster each inner computation can run. However, 
as subsets get larger, a richer set of coherence constraints are 
represented and thus, the solution to (J2J more closely ap- 
proximates the CAP-aggregate. When designing {(7j}j =1 , 
one must therefore strike a balance between approximation 
and speed. 

A natural way to make this trade-off is by exploiting 
the logical simplicity of the events in question. To illus- 
trate, consider the case where the events in {Ei}^i are 
constrained to be basic events, negations of basic events, 
and two-term conjunctions (or disjunctions) of the basic 
events or their negations. A sample set of events that meet 
these criteria are drawn in Figures 1,2, and 3 (ignoring the 
dashed lines for a moment). 

The linear averaging approach to aggregation can be 
viewed as a special case of this general method, where one 
subset is chosen per event; these subsets are depicted by 
the dashed lines in Figure 1. This highly local approach 
can be implemented very quickly, however the solution to 
(0 may poorly approximate the CAP-aggregate since few 



X,aX, 




X, a ->X, 




X 2 aX„ 




X n a 'X, 



Figure 2: CAP 




X,aX 2 


/ 
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X, a -X, 




X 2 aX„ 




X„a-Xj 
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Figure 3: A Scalable Approach 
Comments 



First, let us emphasize that {crj}' =1 is a design parame- 
ter. Depending on the design, the output forecasts may or 
may not be coherent; recall, the algorithm solves J2J), a, re- 
laxation of CAP. Intuitively, however, for any {o~j} l j =1 the 
output will be closer to coherence, since it will satisfy a set 
of local coherence constraints. This intuition is formalized 
by Theorem 2 below. 
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Second, in Step 2, the local coherence constraints are ad- 
dressed in sequence. Note that this ordering is non-essential 
and parallelism may be introduced. In particular, two pro- 
jections can occur simultaneously as long as there is no 
overlap between the events in question (i.e., two projections 
cannot change the same variables simultaneously). 

Next, the assumption that each event in {E i } , £_ 1 is 
unique can be relaxed. For any set {{E i ,p i )}^L 1 with Nj 
forecasts for each unique event Fj £ {Ei}"^ 1: one can con- 
struct a new set {{Fj,q~j)} with qj = J2i-E =f Pi- Then, 
solving 

min J2j N j\Pj ~ 9j\ 2 
s.t. {(Fj,pj)} is coherent. 

is equivalent to solving JIJ with {{Ei,pi)}™L l . The same 
trick can be applied to the relaxation J2J , an d the algorithm 
in Table 4 can be adjusted similarly. 

Finally, the algorithm depicted in Table 4 permits a per- 
formance guarantee. In particular, assume that after learn- 
ing the "truth" of the events in question, the accuracy of the 
forecasts in {(-Ej, P«)}™ 1 is assessed using the Brier score 
0], a quadratic penalty: 

Qp(m,p i )})= E a-&) 2 + E (°-^) 2 

i:£;=TRUE i:£i=FALSE 

(3) 

The algorithm in Table 4 offers a stepwise improvement 
in accuracy as measured by the Brier score, independent of 
the truth or falsity of the events in question. Theorem 2 
formalizes this important fact. 

Theorem 2 Let {(Ei, <fT,»)}£i denote the set of forecasts 
output by the algorithm after running T iterations with in- 
put forecasts {(Ei 1 pi)}™ =1 . Then, 

QP({(E h q T>i )}) < QP({(E t ,q T _ hl )}) 

under every realizable truth assignment to the events in 
{Ei\™_ 1 . Moreover, as T — > oo, the output forecasts con- 
verge (i.e., (\t converges in norm), and are locally coherent 
with respect to o~j for all j = 1, . . . , I. 

The proof of Theorem 2 follows from Theorem 1 and de 
Finetti's Theorem [H], ^Hj- If {{Ei,Pi)}iLi contains a single 
judge's forecasts (i.e., the algorithm is applied to eliminate 
intra-judge incoherence), then Theorem 2 predicts a step- 
wise improvement in the accuracy of that judge. If instead 
{(i?i,Pi)}™i contains a panel's forecasts, then Theorem 2 
predicts that at each step, a randomly selected judge will 
improve on average. 

Note that T, the number of iterations through the fore- 
casts, is a second design parameter for this algorithm that 
in principle must be tuned. However, Theorem 2 demon- 
strates a sense in which performance is monotonic in T. 
Moreover, for any T, the output forecasts will be more ac- 
curate than the input forecasts (with respect to the Brier 
score), independent of the truth or falsity of the events in 
question. 



5 Experiments 

In this section, we empirically validate the aggregation algo- 
rithm presented in Section 4. In particular, our experiments 
focus on two issues: (i) the effect that aggregation (i.e., fu- 
sion) has on the panel's forecasting accuracy and (ii) how 
the algorithm scales to large data sets, i.e., how "fast" the 
algorithm is in practice. 

5.1 The Data 

Five previously collected data sets will be used in these 
experiments. The STCK database was first published in 
[T?)] and contains forecasts made by MBA students at Rice 
University on events pertaining to 10 stocks in the third 
quarter of 2000; the FIN database is documented in [2] and 
summarizes forecasts made by students at Rice on events 
related to various economic indicators in the fourth quar- 
ter of 2001; the NBA1 and NBA2 data sets appeared in 
and detail forecasts made by self-proclaimed basketball 
enthusiasts regarding the outcome of two Houston Rock- 
ets National Basketball Association games; the HSTN data 
set 21] contains forecasts made by Houston homeowners on 
events pertaining to the local real-estate market and pollu- 
tion. 

In each of the five data sets, subjects were asked to as- 
sess the likelihood of 34 randomly selected basic (10) and 
complex (24) events. The complex events were constrained 
to have one the following forms: p/\q, p/\~<q, pVq, or pV^q. 
The number of subjects (i.e., the size of the panel) per data 
set is summarized in Table 5, as is the total number of ba- 
sic events (i.e., the length of X) from which the forecasted 
events were constructed. Due to the random allocation of 
events per subject, multiple experts often provided forecasts 
pertaining to the same event. In Table 5, "Events/Agg" de- 
scribes the number of unique events per panel. 





STCK 


FIN 


NBA1 


NBA2 


HSTN 


Subjects 


47 


31 


29 


36 


17 


Basic Events 


30 


10 


10 


10 


10 


Events/Agg. 


1598 


1054 


986 


1224 


578 



Table 5: Data Summary 
5.2 The Method 

In each of the following experiments, we employ the aggre- 
gation algorithm detailed in Section IV. Since in each data 
set, complex events are constrained to one of the forms pAq, 
p A ->q, pV q, or p V ->q, subsets are chosen precisely as illus- 
trated by Figure 3. Interestingly, for these subsets, deter- 
ministic rules can be derived for solving each optimization 
in Step 2 (Table 4) ; we forego describing these easily derived 
rules in the interest of space. 

For every forecast reported in each database, the truth- 
value of the corresponding event is known. This allows 
us to assess the accuracy of various forecasts a posteri- 
ori. Here, accuracy is measured using the Brier score 
(j2J and slope, which is defined as follows: if my de- 
notes the number of true events in {Ei}%L lt slope measures 
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the stochastic accuracy of the forecasts {{Ei,pi)}^l 1 using 

^7 £<:J5 ( = T KUE&-;S=W EfcJS,=FAME&; hi S her sl °P e in " 

dicates more accurate forecasts. We assess the accuracy of 
forecasts in four cases of interest. 

• Raw: the accuracy of the judge's raw forecasts. The 
average accuracy of each judge's unprocessed forecasts 
is reported. 

• Individual: the accuracy after eliminating intra-judge 
incoherence (i.e., after running the algorithm on each 
individual judge). The average accuracy of the judge's 
forecasts after processing is reported. 

• Aggregate: the accuracy after aggregation using our 
method. The accuracy of each judge is assessed after 
replacing her original forecasts with the aggregate fore- 
casts (for the same events) ; the average judge's score 
is reported. 

• Linear Avg.: the accuracy of the linear averaged ag- 
gregate. The accuracy of each judge is assessed after 
replacing her original forecasts with the linear aver- 
aged aggregate (again, for the same events); the aver- 
age judge's score is reported. 

Note that when measuring accuracy with slope, the score 
reported for linear averaging will be the same as that which 
is reported for raw. 

5.3 Experiment 1: Scalability 

Figures 4 and 5 detail the average Brier score achieved by 
the panel vs. the number of iterations (T) made by our algo- 
rithm, in the Individual and Aggregation cases respectively. 
Note that the monotonicity of these plots is predicted by 
Theorem 2. In both cases and in every data set, the algo- 
rithm converges within 10 iterations through the forecasts. 

From a computational perspective, the most interest- 
ing data set is the STCK database, since it contains the 
largest number of unique events per aggregate and the most 
basic events. On a 1GHz PowerPC G4, aggregating the 
database of 1598 forecasts took approximately 10s. In con- 
trast, the rival method SAPA ^fj] was reported to take mul- 
tiple hours. Incidentally, the time required to eliminate in- 
coherence from individual judges was less than 0.6s. 

5.4 Experiment 2: Forecasting Accuracy 

Osherson and Vardi |16| report three important empirical 
findings. First, they observe that eliminating intra-judge 
incoherence improves the forecasting accuracy of individ- 
ual judges (i.e., Individual is better than Raw). Second, 
they observe that panel aggregation improves the forecast- 
ing accuracy of panel members (i.e., Aggregate improves 
over Raw). Finally, |16) reports that aggregation improves 
the accuracy of panel members as compared to incoherence- 
corrected forecasts (i.e., Aggregate improves over Indivd- 
ual). Discussed in part in reference |16| . these findings are 
anticipated by de Finetti's theorem |U] when accuracy is as- 
sessed using the Brier score. However, Osherson and Vardi's 
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Figure 4: Individual: Average Brier Score vs. T. 
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Figure 5: Aggregate: Average Brier Score vs. T . 



findings hold up under alternative accuracy measurements 
including slope. 

In the previous section, we documented a several orders 
of magnitude speed-up. Here, we question whether this has 
been achieved at the expense of accuracy. In particular, 
we question whether Osherson and Vardi's empirical ob- 
servations hold up when using our method. Tables 6 and 
7 summarize the result for Brier score and slope, respec- 
tively. These results are in agreement with the findings 
of Osherson and Vardi except that the aggregate slopes 
are not consistently higher than for the individual applica- 
tion of our algorithm. As reported in reference |16| for the 
STCK dataset, the SAPA method yielded average per sub- 
ject accuracy as 0.276 (Indivdual), while the "optimal" CAP 
calculation computed using quadratic programming yieled 
0.272 (Individual). Note that CAP We thus conclude that 
the proposed method provides a significant computational 
speed-up while achieving competitive forecasting gains. 
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STCK 


FIN 


NBA1 


NBA2 


HSTN 


Raw 


0.309 


0.243 


0.239 


0.228 


0.318 


Individual 


0.273 


0.220 


0.205 


0.207 


0.257 


Aggregate 


0.245 


0.200 


0.188 


0.191 


0.220 


Linear Avg. 


0.286 


0.207 


0.203 


0.196 


0.234 



Table 6: Forecasting Accuracy: Brier Score 





STCK 


FIN 


NBA1 


NBA2 


HSTN 


Raw 


0.064 


0.153 


0.140 


0.141 


0.129 


Individual 


0.109 


0.172 


0.186 


0.169 


0.210 


Aggregate 


0.114 


0.153 


0.173 


0.150 


0.202 



Table 7: Forecasting Accuracy: Slope 



6 Discussion 

An underlying assumption of the current study is that the 
Brier-score (e.g., squared-error) is the appropriate mea- 
sure for assessing forecasting accuracy and probabilistic 
(in)coherence. However, de Finetti's theorem, the von- 
Neumann-Halperin algorithm (and generalizations such as 
Dykstra's algorithm) have all been extended to a wide class 
of distance measures known as Bregman divergences [3] 
(which include the Brier-score and relative-entropy as spe- 
cial cases); for details, see [5] and As a result, our 
methods and analysis can be generalized to accommodate 
a large class of alternative accuracy measurements. 

The message-passing algorithm derived in Section 4 is 
reminiscent of belief propagation, the sum-product algo- 
rithm, and junction-trees more generally 2 . It is thus natural 
to ask (i) whether CAP could be solved using an appropriate 
factor graph representation and the junction tree algorithm 
and (ii) whether the algorithm derived in Section 4 can be 
viewed as an instantiation of one such approach. Address- 
ing (ii) may require one to interpret alternating projection 
algorithms in the context of the junction-tree algorithm ap- 
plied to a factor graph representation of our local coherence 
constraints. 

Since researchers in wireless sensor networks are inter- 
ested in similar aggregation problems, it is natural to ask 
whether these tools are applicable in a WSN setting where 
the "experts" are electro-mechanical sensors. If in a given 
WSN application, sensors provide forecasts of probability 
for both logically simple and complex events, then these 
tools are immediately applicable. However, the general idea 
of relaxing a projection by exploiting an underlying notion 
of locality is more widely applicable. For example, in |17j . 
a distributed algorithm is constructed for collaboratively 
training least-square kernel regression estimators. Similarly 
to above, the algorithm was derived using alternating pro- 
jection algorithms applied to network topology dependent 
relaxation of the classical least-squares estimator. 
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